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DETAILED ACTION 

Information Disclosure Statement 

1 . The Information Disclosure Statement filed 04 November 2008 fails to comply 
with the provisions of 37 CFR 1 .97, 1 .98 and MPEP § 609 because it does not 
represent prior art, but is only attorney work-product produced for purposes of litigation. 

2. The Information Disclosure Statement filed 22 October 2008 fails to comply with 
the provisions of 37 CFR 1 .97, 1 .98 and MPEP § 609 because it is a duplicate of 
another Information Disclosure Statement filed 22 October 2008. Accordingly, only one 
of the two Information Disclosure Statements filed 22 October 2008 is being considered; 
the other is not being entered. 

Claim Rejections - 35 USC §112 

3. The following is a quotation of the first paragraph of 35 U.S.C. 1 1 2: 

The specification sliall contain a written description of tlie invention, and of tlie manner and process of 
mailing and using it, in sucli full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 

4. Claims 14, 56, 72, and 74 are rejected under 35 U.S.C. 112, first paragraph, as 
failing to comply with the written description requirement. The claims contain subject 
matter which was not described in the specification in such a way as to reasonably 
convey to one skilled in the relevant art that the inventors, at the time the application 
was filed, had possession of the claimed invention. 
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Claims 14 and 56 contain the limitation of computing compressed MFCC 
coefficients at either the client system or the server computing system "on a connection 
by connection basis", which limitation is not supported by a written description in the 
originally-filed Specification. Applicants' Specification discloses optimizing computation 
of MFCC coefficients on a case-by-case basis, 1|[0173], and a system-by-system basis, 
1I[0174], of corresponding U.S. Patent Publication 2005/0080625, but not on a 
connection-by-connection basis. The Specification, 1|[0153], expressly states that 
initialization of a client system is comprised of three "separate initializing processes" for 
Speech Recognition Engine 220A, MS Agent 220B, and Communication Processes 
220C. Moreover, Figures 2A to 2D clearly show that the boxes representing SR 
initialization are separate and distinct from the boxes for opening and closing an Internet 
session. Thus, initialization of a client speech recognition engine is not disclosed to 
occur on a "connection-by-connection basis" between the opening and closing of the 
Internet connection, but appears to be an independent process occurring even before 
the Internet connection is established. 

Claims 72 and 74 contain the limitation of "said client system is a portable 
electronics device and a data content representative of speech data values Is 
configured based on a processing ability of such device", which limitation Involves new 
matter because it is not supported by the originally-filed Specification. Corresponding 
U.S. Patent Publication 2005/0080625, 1|[0081], briefly discloses a client computing 
system can be as simple as a personal digital assistant or a cell phone, but does not 
disclose configuring speech data values based upon a processing ability of a portable 
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electronics device. The Specification, 1|[0173] to 1I[0174], discloses partitioning of signal 
processing capabilities between the client and server side, but partitioning of signal 
processing capabilities is not disclosed for a portable electronics device. 

Claim Rejections - 35 USC § 103 

5. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary sl<ill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

6. Claims 11 to 13, 53 to 55, 71, and 73 are rejected under 35 U.S.C. 103(a) as 
being unpatentable over Barclay et al. in view of Lennig et al. 

Concerning independent claims 11 and 53, Barclay et al. discloses a client/server 
speech recognizer, comprising: 

"a first audio signal receiving routine for receiving user speech utterance signals 
representing speech utterances to be recognized during a sequence of speech 
utterance evaluation time frames, said speech utterances including sentences 
comprised of one or more words" - client system 2 has microphone 10 that accepts 
audio input, converts a sound into a digital representation by analog to digital 
conversion (ADC), and extracts a set of features that provide the best recognition 
(column 4, line 56 to column 5, linelO: Figure 1); speech is recognized as words and 
sentences for an utterance, "Show me the weather for Boston" (column 3, lines 6 to 24), 
or "I want to fly from Boston to Denver tomorrow" (column 9, lines 16 to 25); continuous 
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speech is divided into discrete transformed segments that facilitate mathematical 
operations (column 2, lines 14 to 16); implicitly, these discrete segments are called 
'frames' ("a sequence of speech utterance evaluation time frames"); 

"a first signal processing routine adapted to generate representative speech data 
values for each speech utterance evaluation time frame during which speech utterance 
signals are received, [said representative speech data values including a set of 
compressed mel-frequency cepstral coefficients (MFCC)]" - client system 2 has front- 
end 12 that extracts a set of features ("representative speech data values"), called 
cepstra, that provide the best recognition, and the features are quantized (column 5, 
lines 2 to 11: Figure 1); 

"a formatting routine for rendering said representative speech data values into a 
transmission format suitable for transmission from the client system over a 
communications channel to a second processing routine executing on the server 
computing system" - dispatcher program of the client 70 sends quantized speech 
features via a communications link to server computer system 80 (column 5, line 48 to 
column 6, line 6: Figure 4); a dispatcher implements transmission by TCP/IP protocol 
(column 6, lines 50 to 53); 

"wherein said representative speech data values are transmitted continuously 
during said speech utterances within streaming packets and without waiting for silence 
to be detected and/or said speech utterances to be completed" - the dispatcher does 
not wait for completion of logical blocks and the dispatcher itself is designed to quickly 
service the Internet with minimum overhead time (column 6, lines 56 to 59); front end 
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streams the quantized data to the dispatcher, where 'stream' is defined as to send 
substantially continuously the data in real-time (column 7, lines 13 to 19); features are 
sent as packets (column 7, lines 48 to 59), implicitly, in TCP/IP protocol, a most likely 
transcription or text is determined until an end-of-speech (EOS) is received (column 6, 
lines 5 to 8); thus, a dispatcher does not wait until an end-of-speech, i.e. silence, is 
detected to initiate transmission of quantized feature data; 

"further wherein said representative speech data values constitute a minimum 
amount of information that can be used by said second processing routine to complete 
accurate recognition of said one or more words and said sentences" - extracted sets of 
features, called cepstra, are those features that provide the best recognition as known 
in the art (column 5, lines 2 to 8); implicitly, cepstra are "a minimum amount of 
information that can be used" for speech recognition because they are a compressed 
form of the most relevant features for speech recognition; 

"and said communication channel further being configured by the machine 
executable program such that grammar related information is sent to the second 
processing routine executing on the server computing system for identifying a grammar 
to be used for recognition of said one or more words from said sentences" - a grammar 
Is distributed and down loaded when a Web page for specific topics Is entered; a 
weather report page could have a grammar specific to words and phrases associated 
with the weather; a grammar that recognizes an utterance as "Show me the weather for 
Boston" yields a weather report for Boston (column 3, lines 6 to 24); thus, when a client 
selects a Web page for a weather report, it notifies a server to employ a grammar for 



Application/Control Number: 10/684,357 Page 7 

Art Unit: 2626 

recognizing words and plirases associated witli tlie weatlier ("grammar related 
information is sent to tine second processing routine executing on tine server computing 
system"). 

Concerning independent claims 11 and 53, Barclay etal. discloses cepstra as 
representative speech data values, but does not expressly disclose "said representative 
speech data values including a set of compressed mel-frequency cepstral coefficients 
(MFCC)". However, mel-frequency cepstral coefficients (MFCC) are a well known form 
of cepstra in speech recognition. Specifically, Lennig etal. teaches a speech 
recognizer, where the spectral representation of speech is mel-based cepstral and 
dynamic components. (Abstract) Mel-frequency cepstrum may be used as short-term 
spectral representations of primary parameters. (Column 1 , Lines 5 to 18; Column 2, 
Lines 9 to 17; Column 2, Lines 43 to 47; Column 6, Lines 1 to 1 1) An advantage is that 
mel-based cepstral coefficients can represent a spectral shape and change in spectral 
shape, in addition to changes in loudness or amplitude, so that dynamic parameters can 
result in a reduction of recognition errors over a public switched telephone network of 
about 20%. (Column 2, Lines 42 to 49; Column 6, Lines 1 to 1 1 ) It would have been 
obvious to one having ordinary skill in the art to utilize mel frequency cepstral 
coefficients as taught by Lennig et al. in a client/server speech recognizer of Barclay et 
al. for a purpose of obtaining a reduction of recognition errors over a network by 
representing dynamic parameters for changes in spectral shape and amplitude. 



Application/Control Number: 10/684,357 Page 8 

Art Unit: 2626 

Concerning claim 12, Barclay et al. discloses incorporating a dispatcher into a 
browser of a client side application for speech recognition at a server (column 8, lines 
36 to 64: Figure 4). 

Concerning claims 13 and 55, Lennig etal. discloses estimating a magnitude or 
power spectrum for mel-frequency cepstra over a time frame between 2 and 50 mS 
(column 2, lines 56 to 65; column 6, lines 43 to 48); 100 frames per second implies 1 
frame = 1/100 second, or 1 x 10"^ seconds = 10x10"^ seconds, or about 10 
milliseconds; thus, a 2 millisecond time frame for calculating cepstra corresponds to 
about 500 frames per second; energy is determined for a plurality of twenty channels 
spanning a frequency range from about 100 Hz to about 4000 Hz (column 3, lines 4 to 
11); each channel corresponds to "a corresponding frequency component", and a 
frequency range from 100 Hz to 4000 Hz is equivalent to "an audible speech frequency 
range." 

Concerning claim 54, Barclay etal. discloses real-time speech recognition 
processing and response (Abstract). 

Concerning claims 71 and 73, Barclay etal. discloses a LISTEN button on a 
graphical user interface (GUI) of a client, so that speech recognition starts when a user 
presses a LISTEN button ("in response to a button being pressed on the client system") 
(column 8, lines 48 to 64); when a user clicks on LISTEN, natural language processing 
begins for a user request for an airline reservation ("a query/answer routine") (column 9, 
lines 5 to 30). 
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7. Claims 15, 57, 72, and 74 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Barclay et al. in view of Lennig et al., as applied to independent 
claims 1 1 and 53, and further in view of White et al. ('272). 

Concerning Independent claims 15 and 57, Barclay et al. In view of Lennig et al. 
disclose all the limitations of independent claims 1 1 and 53, but omit "wherein said 
second processing routine is configured with an amount of resources by said server 
computing system based on a bandwidth and transmission speed associated with the 
transmission link between said server computing system and said client system so that 
said second processing routine performs accurate recognition of said one or more 
words with a first latency that is less than a second latency that would result if said one 
or more words were recognized by said first signal processing routine and then 
transmitted over said transmission link." However, Wliite etal. ('272) teaches a 
distributed voice user interface, where a number of factors are considered In dividing 
speech recognition functionality between local devices 14 and remote system 12. 
These factors include an amount of processing and memory capability available at each 
of local devices 14 and remote system 12, the bandwidth of the link between each local 
device 14 and remote system 12, and the kinds of commands expected from the user. 
(Column 5, Lines 23 to 37) Implicitly, by optimizing division of functionalities between a 
local device and a remote system, distributed speech recognition is faster, or has lower 
latencies, than would be present without the optimization ("with a latency that is less 
than a second latency"). An advantage is to provide a bulk of hardware and/or software 
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for implementing a sophisticated voice user interface at a single remote system to 
substantially reduce costs. (Column 2, Line 65 to Column 3, Line 14) It would have 
been obvious to one having ordinary skill in the art to configure processing routines 
between a client system and a server system based on bandwidth and transmission 
speed to reduce latency as taught by White etal. ('272) in a client/server speech 
recognizer of Barclay et al. for a purpose of obtaining a reduction in cost for a 
sophisticated voice user interface. 

Concerning claims 72 and 74, White et al. ('272) teaches that each local device 
14 can be a relatively small, portable, inexpensive, and/or low power-consuming "smart 
device", such as a personal digital assistant (PDA) or a smart telephone (column 5, 
lines 38 to 47); dividing speech recognition functionality between local devices 14 and 
remote system 12 is based on an amount of processing and memory capability 
available at each of local devices 14 (column 5, lines 23 to 37). 

8. Claims 14 and 56 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Barclay et al. in view of Lennig et al. as applied to claims 1 1 and 53, and further in 
view of White et al. ('272) and Huang et al. 

White et al. ('272) discloses evaluating computing resources available at a local 
device 14 ("said client system") and a remote system 12 ("said server"). (Column 5, 
Lines 23 to 37) Lennig et al. teaches that mel frequency cepstra can include dynamic 
static parameters, C, and dynamic parameters, AC. (Column 4, Line 52 to Column 5, 
Line 28) However, Lennig et al. does not expressly disclose delta and acceleration 
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coefficients for MFCC's, although It Is well known that dynamic parameters Include delta 
and acceleration coefficients for MFCC's. Specifically, Huang et al. teaches LPC 
cepstra analysis, where 12 mel-frequency cepstral coefficients, 12 delta mel frequency 
cepstral coefficients, and 12 delta delta mel-frequency cepstral coefficients are 
employed to model the speech signal. (Column 5, Lines 6 to 20) An objective is to 
provide feature extraction for spectral analysis of speech to represent a portion of an 
utterance. (Column 3, Lines 25 to 43) It would have been obvious to one having 
ordinary skill In the art to employ a representation of mel-frequency cepstral coefficients 
("MFCC's") with delta and delta delta ("acceleration") coefficients as taught by Huang et 
al. in a speech recognition method of Lennig et al. for a purpose of representing a 
portion of an utterance by a known method of feature extraction for spectral analysis of 
speech. 

Response to Arguments 

9. Applicants' arguments filed 12 May 2008 have been considered but are moot In 
view of the new ground of rejection. 

Applicants' petition to accept an unintentionally delayed claim for a benefit of 
priority was granted on 03 July 2008. The granting of the petition effectively removes 
certain prior art, necessitating herein new grounds of rejection. 

Applicants' petition to revive an unintentionally abandoned application was 
granted on 10 November 2008. 
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10. Applicants' arguments filed 12 May 2008 have been fully considered but they are 
not persuasive. 

Applicants argue that language directed to on a "connection-by-connection basis" 
is supported by the Specification to traverse the rejection of claims 14 and 56 under 35 
U.S.C. §112, 1st 1|. Applicants direct attention to a Declaration by Dr. Melvyn, and to 
Figures 2A, 2C, 2D, and 3 of the Specification, to show disclosure for opening a 
connection, configuration during the connection, and closing the connection. These 
arguments are not persuasive. 

It is agreed that the Specification discloses opening and closing an Internet 
connection to establish a session, but Applicants' originally-filed Specification does not 
show that configuration based on an evaluation of available computing resources 
occurs during the session. That is, the problem is with Step b), which Applicants call, 
"during the connection". Indeed, Applicants' Specification teaches against performing 
the configuration based upon available computing resources during the connection. 
The Specification, 1|[0153], expressly states that initialization of a client system is 
comprised of three "separate initializing processes" for Speech Recognition Engine 
220A, MS Agent 220B, and Communication Processes 220C. Moreover, Figures 2A to 
2D clearly show that the boxes representing SR initialization are separate and distinct 
from the boxes for opening and closing an Internet session. Thus, initialization of a 
client speech recognition is not disclosed to occur on a "connection-by-connection 
basis" between the opening and closing of the Internet connection, but appears to be an 
independent process occurring even before the Internet connection is established. It 
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would be reasonable to conclude that the configuration of speech processing at a client 
and a server based upon a evaluation of resources is taking place on a fixed basis for 
every client device at a time after powering up, when the speech recognition engine is 
being initialized, but is independent of establishing a network connection. 

Therefore, the rejections of claims 14, 56, 72, and 74 under 35 U.S.C. §112, 1^' 
as failing to comply with the written description requirement; of claims 11 to 13, 53 to 
55, 71 , and 73 under 35 U.S.C. §1 03(a) as being unpatentable over Barclay et al. in 
view of Lennig et al.; of claims 15, 57, 72, and 74 under 35 U.S.C. §1 03(a) as being 
unpatentable over Barclay et al. in view of Lennig et al., and further in view of White et 
al. ('272); and of claims 14 and 56 under 35 U.S.C. §1 03(a) as being unpatentable over 
Barclay et al. in view of Lennig et al., and further in view of White et al. ('272) and 
Huang et al., are proper. 

Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to MARTIN LERNER whose telephone number is 
(571 )272-7608. The examiner can normally be reached on 8:30 AM to 6:00 PM 
Monday to Thursday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David R. Hudspeth can be reached on (571 ) 272-7843. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/Martin Lerner/ 
Primary Examiner 
Art Unit 2626 
November 18, 2008 



